Skip to content

Optimize the codegen for Span::from_expansion #140485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 3, 2025

Conversation

Jarcho
Copy link
Contributor

@Jarcho Jarcho commented Apr 29, 2025

See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions.

Also tried fully writing the function by hand:

sp.ctxt_or_parent_or_marker != 0
        && (
            sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER 
            || sp.len_with_tag_or_marker & PARENT_TAG == 0
        )

But that was no better than this PR's current use of match_span_kind.

@rustbot
Copy link
Collaborator

rustbot commented Apr 29, 2025

r? @jieyouxu

rustbot has assigned @jieyouxu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 29, 2025
@rust-log-analyzer

This comment has been minimized.

@Urgau
Copy link
Member

Urgau commented Apr 29, 2025

Is this function this hot that it needs micro-optimized? Have you benchmark it on real world case?

@Jarcho
Copy link
Contributor Author

Jarcho commented Apr 29, 2025

There are more than 300 call sites (most from clippy) and it ends up being called more than once per expression while linting (in clippy). It's definitely used a lot and frequently called.

@Jarcho Jarcho force-pushed the from_expansion_opt branch from e332f70 to 761d0ec Compare April 29, 2025 20:48
@jieyouxu
Copy link
Member

r? @petrochenkov

@rustbot rustbot assigned petrochenkov and unassigned jieyouxu Apr 30, 2025
@petrochenkov
Copy link
Contributor

petrochenkov commented Apr 30, 2025

I wonder how to write this better so it doesn't look like magic.
It took me some time and manual inlining to understand why this works.

@petrochenkov
Copy link
Contributor

Perhaps like this:

        let ctxt = match_span_kind! {
            self,
            // All branches here, except `InlineParent`, actually return `span.ctxt_or_parent_or_marker`,
            // this makes the code optimize very well by eliminating the `ctxt_or_parent_or_marker` comparison.
            InlineCtxt(span) => SyntaxContext::from_u16(span.ctxt),
            InlineParent(_span) => SyntaxContext::root(),
            PartiallyInterned(span) => SyntaxContext::from_u16(span.ctxt),
            Interned(_span) => SyntaxContext::from_u16(CTXT_INTERNED_MARKER),
        };

@petrochenkov
Copy link
Contributor

In any case, let's benchmark this on rustc too.
@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 30, 2025
@petrochenkov petrochenkov removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 30, 2025
bors added a commit to rust-lang-ci/rust that referenced this pull request Apr 30, 2025
Optimize the codegen for `Span::from_expansion`

See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions.

Also tried fully writing the function by hand:
```rust
sp.ctxt_or_parent_or_marker != 0
        && (
            sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER
            || sp.len_with_tag_or_marker & PARENT_TAG == 0
        )
```

But that was no better than this PR's current use of `match_span_kind`.
@bors
Copy link
Collaborator

bors commented Apr 30, 2025

⌛ Trying commit 761d0ec with merge fb65274...

@bors
Copy link
Collaborator

bors commented Apr 30, 2025

☀️ Try build successful - checks-actions
Build commit: fb65274 (fb65274cfcdf4129ff7a2c2c9c7c63b4014b82d4)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (fb65274): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.5% [-2.8%, -0.3%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.5% [-2.8%, -0.3%] 2

Max RSS (memory usage)

Results (primary 0.7%, secondary 1.3%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.4% [0.4%, 6.4%] 6
Regressions ❌
(secondary)
3.2% [2.6%, 3.7%] 2
Improvements ✅
(primary)
-0.5% [-0.6%, -0.4%] 8
Improvements ✅
(secondary)
-2.4% [-2.4%, -2.4%] 1
All ❌✅ (primary) 0.7% [-0.6%, 6.4%] 14

Cycles

Results (primary -0.7%, secondary -2.5%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.4% [0.4%, 0.5%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.9% [-2.6%, -0.4%] 11
Improvements ✅
(secondary)
-2.5% [-3.0%, -2.2%] 5
All ❌✅ (primary) -0.7% [-2.6%, 0.5%] 13

Binary size

Results (primary -1.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.1% [-1.1%, -1.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -1.1% [-1.1%, -1.1%] 1

Bootstrap: 770.233s -> 770.093s (-0.02%)
Artifact size: 365.54 MiB -> 365.45 MiB (-0.02%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 30, 2025
@petrochenkov petrochenkov added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Apr 30, 2025
@Jarcho Jarcho force-pushed the from_expansion_opt branch from 761d0ec to d9c060b Compare April 30, 2025 19:32
@Jarcho
Copy link
Contributor Author

Jarcho commented Apr 30, 2025

Went with a more verbose explanation of why this version optimizes better and why it gets the right result. Also that's a little more of a benefit than I was expecting from just rustc.

@petrochenkov
Copy link
Contributor

Also that's a little more of a benefit than I was expecting from just rustc.

The clap_derive change is definitely spurious.
typenum and match-stress may be real, or may be just inlining working differently.

@petrochenkov
Copy link
Contributor

@bors r+ rollup=maybe

@bors
Copy link
Collaborator

bors commented Apr 30, 2025

📌 Commit d9c060b has been approved by petrochenkov

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 30, 2025
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request May 1, 2025
…chenkov

Optimize the codegen for `Span::from_expansion`

See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions.

Also tried fully writing the function by hand:
```rust
sp.ctxt_or_parent_or_marker != 0
        && (
            sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER
            || sp.len_with_tag_or_marker & PARENT_TAG == 0
        )
```

But that was no better than this PR's current use of `match_span_kind`.
bors added a commit to rust-lang-ci/rust that referenced this pull request May 1, 2025
…iaskrgr

Rollup of 7 pull requests

Successful merges:

 - rust-lang#134034 (handle paren in macro expand for let-init-else expr)
 - rust-lang#139186 (Refactor `diy_float`)
 - rust-lang#140062 (std: mention `remove_dir_all` can emit `DirectoryNotEmpty` when concurrently written into)
 - rust-lang#140430 (Improve test coverage of HIR pretty printing.)
 - rust-lang#140485 (Optimize the codegen for `Span::from_expansion`)
 - rust-lang#140505 (linker: Quote symbol names in .def files)
 - rust-lang#140521 (interpret: better error message for out-of-bounds pointer arithmetic and accesses)

r? `@ghost`
`@rustbot` modify labels: rollup
Zalathar added a commit to Zalathar/rust that referenced this pull request May 1, 2025
…chenkov

Optimize the codegen for `Span::from_expansion`

See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions.

Also tried fully writing the function by hand:
```rust
sp.ctxt_or_parent_or_marker != 0
        && (
            sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER
            || sp.len_with_tag_or_marker & PARENT_TAG == 0
        )
```

But that was no better than this PR's current use of `match_span_kind`.
bors added a commit to rust-lang-ci/rust that referenced this pull request May 1, 2025
Rollup of 11 pull requests

Successful merges:

 - rust-lang#134034 (handle paren in macro expand for let-init-else expr)
 - rust-lang#138703 (chore: remove redundant words in comment)
 - rust-lang#139186 (Refactor `diy_float`)
 - rust-lang#139343 (Change signature of File::try_lock and File::try_lock_shared)
 - rust-lang#139780 (docs: Add example to `Iterator::take` with `by_ref`)
 - rust-lang#139802 (Fix some grammar errors and hyperlinks in doc for `trait Allocator`)
 - rust-lang#140034 (simd_select_bitmask: the 'padding' bits in the mask are just ignored)
 - rust-lang#140062 (std: mention `remove_dir_all` can emit `DirectoryNotEmpty` when concurrently written into)
 - rust-lang#140485 (Optimize the codegen for `Span::from_expansion`)
 - rust-lang#140505 (linker: Quote symbol names in .def files)
 - rust-lang#140521 (interpret: better error message for out-of-bounds pointer arithmetic and accesses)

r? `@ghost`
`@rustbot` modify labels: rollup
Zalathar added a commit to Zalathar/rust that referenced this pull request May 1, 2025
…chenkov

Optimize the codegen for `Span::from_expansion`

See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions.

Also tried fully writing the function by hand:
```rust
sp.ctxt_or_parent_or_marker != 0
        && (
            sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER
            || sp.len_with_tag_or_marker & PARENT_TAG == 0
        )
```

But that was no better than this PR's current use of `match_span_kind`.
bors added a commit to rust-lang-ci/rust that referenced this pull request May 1, 2025
Rollup of 10 pull requests

Successful merges:

 - rust-lang#134034 (handle paren in macro expand for let-init-else expr)
 - rust-lang#138703 (chore: remove redundant words in comment)
 - rust-lang#139186 (Refactor `diy_float`)
 - rust-lang#139343 (Change signature of File::try_lock and File::try_lock_shared)
 - rust-lang#139780 (docs: Add example to `Iterator::take` with `by_ref`)
 - rust-lang#139802 (Fix some grammar errors and hyperlinks in doc for `trait Allocator`)
 - rust-lang#140034 (simd_select_bitmask: the 'padding' bits in the mask are just ignored)
 - rust-lang#140062 (std: mention `remove_dir_all` can emit `DirectoryNotEmpty` when concurrently written into)
 - rust-lang#140485 (Optimize the codegen for `Span::from_expansion`)
 - rust-lang#140521 (interpret: better error message for out-of-bounds pointer arithmetic and accesses)

r? `@ghost`
`@rustbot` modify labels: rollup
bors added a commit to rust-lang-ci/rust that referenced this pull request May 2, 2025
…iaskrgr

Rollup of 9 pull requests

Successful merges:

 - rust-lang#140485 (Optimize the codegen for `Span::from_expansion`)
 - rust-lang#140509 (transmutability: merge contiguous runs with a common destination)
 - rust-lang#140519 (Use select in projection lookup in `report_projection_error`)
 - rust-lang#140521 (interpret: better error message for out-of-bounds pointer arithmetic and accesses)
 - rust-lang#140536 (Rename `*Guard::try_map` to `filter_map`.)
 - rust-lang#140550 (Stabilize `select_unpredictable`)
 - rust-lang#140563 (extend the list of registered dylibs on `test::prepare_cargo_test`)
 - rust-lang#140572 (Add useful comments on `ExprKind::If` variants.)
 - rust-lang#140574 (Add regression test for 133065)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit a2ae171 into rust-lang:master May 3, 2025
6 checks passed
@rustbot rustbot added this to the 1.88.0 milestone May 3, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request May 3, 2025
Rollup merge of rust-lang#140485 - Jarcho:from_expansion_opt, r=petrochenkov

Optimize the codegen for `Span::from_expansion`

See https://godbolt.org/z/bq65Y6bc4 for the difference. the new version is less than half the number of instructions.

Also tried fully writing the function by hand:
```rust
sp.ctxt_or_parent_or_marker != 0
        && (
            sp.len_with_tag_or_marker == BASE_LEN_INTERNED_MARKER
            || sp.len_with_tag_or_marker & PARENT_TAG == 0
        )
```

But that was no better than this PR's current use of `match_span_kind`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants